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1. INTRODUCTION 

In the high-tech world, various types of medical tools such as electrocardiogram (ECG), 
electromyography (EMG), computed tomography (CT), positron emission tomography (PET), ultrasound 
(US), magnetic resonance imaging (MRI) and X-ray routinely used to diagnose and analyze illness inside 
human body [1-3]. In the context of medical algorithms, a massive amount of data to be processed has 
affected the organization of the memory units. It requires more space to store the data, as well as the 
intermediate data before being used for the next process. Moreover, most of the sub-disciplines of medical 
image processing are contributed to the matrix transformation operations [4-6]. Thus, efficient 
implementations of the medical imaging algorithms are very challenging. To get over these issues, 
the FPGAs are ideally suited for the hardware implementations and at the same time to achieve better 
performances includes speed, size and power [7]. 

A basic operation of image compression system involves three process, which are transformation, 
quantization and entropy coding. An array of input pixels will go through the first transformation process for 
having the transformed coefficients as the output. Then, the transformed coefficients are quantized to produce 
a finite number of levels. In addition, the entropy coding process is applied to the finite set of numbers to 
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give an additional compression. In technical view, an image is containing of smooth and sharp variations. 
The smooth variation is the base of the image and represents the low-pass variation. On the other side, 
the sharp variation represents the high-pass variation and both variations are added together to gives a 
detailed image. In this context, the purpose of the transformation process is to separate the smooth and 
sharp variations. 

A wavelet act as a mathematical tool to extract information from different kinds of data such as 
audio signals and images [8]. Comparing with Discrete Cosine Transform (DCT), DWT provides more 
advantages and excellent coding gain for image processing applications. The DWT use a frame-based 
computation concept which the image can be separated into many tiles where a larger tile size can avoid the 
blocking artifacts. 

In general, there are two types of design methodology that can be used to synthesizes and implement 
the proposed architectures, which are traditional and hybrid methods. Both methods applied a similar 
procedure of programmed the Hardware Description Language (HDL) using Xilinx Integrated Software 
Environment (ISE). The traditional method is a basic design flows that use the HDL for FPGA synthesize 
and implementation. On the other hand, the hybrid method uses a combination of HDL and Graphical 
Programming (G-code), where this method provides an advantage compared to the traditional method. 

In this paper, Daubechies wavelet transform architecture have been proposed and examined 
specifically for 3-D transform in medical image compression. The aim of this paper is to develop an efficient 
reconfigurable architecture of Daubechies wavelet transform using pipelined direct mapping with hybrid 
method. An evaluation in terms of area, power consumption and maximum frequency is also carried out to 
analyze the performance of the proposed architectures. 

The rest of the paper is organized as follows. An overview of the related work is given in Section 2. 
Mathematical background for Daubechies wavelet transform is described in Section 3. Experimental results 
and analysis of are presented in Section 4. Concluding remarks and further potential ideas to be explored are 
given in Section 5. 


2. LITERATURE REVIEW 
The fundamental unit on computer is represented in a number of bits. For instance, an image 
containing 640x480 pixels (12-bit gray scale) needs more than 3 Megabits per second of storage. Therefore, 
to transmit this image via conventional phone lines with speed range of 56 Kilobits per second, it consumes 
more than one second [9]. For this reason, image compression is important to provide an efficient data 
storage and data transmission [10-12]. In general, the purpose of image compression is to reduce the size and 
blocking artifact of original image without degrading the quality of the image. Because of that, 
the development of an efficient image compression technique becomes the most challenging matter. 
Moreover, compressing medical image is more challenging compared to non-medical image. This is 
because for medical images, the compression algorithms are complex and it should always be stored in 
lossless format even though sometimes lossy format is acceptable [10, 12]. In addition, medical images are 
extremely rich with information contents. Therefore, there is a real need for high-performance systems, 
whilst keeping architectures flexible to allow for quick upgradeability with real-time applications [4]. 
However, most of the existing works carry out an algorithms development and optimization [10], 
[13, 14] without having the hardware implementation. Thus, there still a huge gap for further research in 
reviewing reconfigurable hardware concerning on 3-D transform for medical image compression 
applications. Two major limitations of the existing works are identified as follows: 
a) The image compression has been extensively exploited in [4, 6, 15]. However, medical image 
compression especially dealing with 3-D modalities is considered as a pre-mature research area. 
b) Surveying the literature, even though the family of Daubechies wavelet transform has been used widely 
in the 3-D DWT implementation [10, 13]. But, there is a small amount that makes use of Daubechies 4- 
tap (Daub4) and 6-tap (Daub6) in the 3-D DWT implementation and thus requires further exploitation. 


3. MATHEMATICAL BACKGROUND FOR DAUBECHIES WAVELET TRANSFORM 

Each of the wavelet transform has their own algorithms includes the scaling and wavelet functions. 
The only different between the types of wavelet transform is how the algorithms are computed. Daubechies 
wavelet transform is defined by computing the running averages and differences via scalar products. 
Besides have properties of longer supports, Daubechies wavelet transform also offers compact support 
properties. The smaller number of wavelet tap can be used to avoid the edge problem [10]. Thus, the 
Daubechies 4-tap (Daub4) and 6-tap (Daub6) are used due to their algorithms simplicity compared to others 
Daubechies family. 
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3.1. Daub4 Algorithm 
The Daub4 wavelet is the simplest wavelet among the Daubechies wavelet families. 
Generally, Daub4 have four scaling signals and wavelets coefficients as given in (1) and (2) respectively. 
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The scaling and wavelet functions are calculated by taking the inner product of the coefficients and 
input data values. In the last iteration, input data of s [Nj and s [N + 1] does not exist. In other words, the 


scaling coefficients, (h sys h,) and wavelet coefficients, ( Bi Risks 83) have length 4 and it would send 
the (he te) ( Lis 83) beyond the end of the array of the input signal. This situation is known as edge 


problem, which will occur in any Daubechies wavelet families. 


3.2. Daub6 Algorithm 
The Daub6 wavelet is the most localized members among Daubechies wavelet families and it has 
six scaling signals and wavelets coefficients as given in (3) and (4) respectively, where 


z, = V10 and z, = 5+ 2v10 . 
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The 1-level Daub6 scaling and wavelet functions are defined in the same way as Daub4 wavelet 
transform. In the last iteration, the scaling signals and wavelet coefficients of (fyi hash) and 


( f./ Bis By, 85) respectively would be sent beyond the end of the array. As Daub4 wavelet transform, 
for handling the edge problem, the data set is treated as it is periodic. 


4. RESULTS AND FINDINGS 

The proposed architectures are concerned on the transformation algorithm for medical image 
compression applications. To reduce the complexity of the hardware requirements, the 3-D Daub4/Daub6 
architecture is divided into three 1-D Daub4/Daub6 with two transpose modules in between. 

The purpose of the transpose module is to provide an output in a different order of the input data. 
Thus, the uses of three 1-D Daub4/Daub6 that are performs the algorithm computations along the rows, 
columns and N sub-images respectively can be eliminated to become one operation. So, the 1-D 
Daub4/Daun6 will perform only one same operation. 


4.1. FPGA-based Daub4 Architecture 

Figure | depicts the 1-D Daub4 flow diagram with N-inputs sample for pipelined direct mapping 
implementation. It includes multipliers, shifters, registers and adders for their operation, with notation of 
“Mul.’, “Shift.” and ‘Add.’ for multiplier, shifter and adder, respectively. Since Daubechies wavelet family 
introduces the edge problem, the shifter is used to wrap-around the input data to the beginning. Moreover, 
the computation of the Daub4 algorithms requires eight multipliers at each stage to compute the inner product 
between the input data and Daub4 wavelet coefficients. Each multiplier has a fixed wavelet and scaling 
function coefficients that will be multiplied with the input data sample. On the other side, the adder with two 
inputs is used to calculate the summation of the inner products. 
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N-inputs sample — for every single row 
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Figure |. 1-D Daubé4 pipelined architecture 


4.2. FPGA-based Daub6 Architecture 

Since the Daub6 algorithm has six scaling and wavelet functions, the 1-D Daub6 flow diagram for 
pipelined direct mapping implementation requires twelve multipliers at each stage as depicted in Figure 2. 
Besides the multipliers, the pipelined implementation also uses shifters, adders and registers components. 
The 1-D Daub6 flow diagram is working in similar way as the 1-D Daub4 flow diagram, where the input data 
is firstly shifted to avoid the edge problem. After that, the input data is multiplied with the scaling and 
wavelet function coefficients before the adders calculate the total inner product computation at that stage. 
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N-inputs sample — for every single row 
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Figure 2. 1-D Daub6 pipelined architecture 


4.3. Implementation Results 

Two proposed architectures were synthesized using VHDL and implemented on Xilinx FPGA single 
board RIO (sbRIO 9632). Three parameters are selected to evaluate the performance of the proposed 
pipelined architectures which are area (slices), maximum frequency (MHz) and power consumption (mW) 
are carried out. Table 1 summaries the implementation results for Daub4 and Daub6 pipelined architectures. 


Table 1. Implementation Results 








Parameters Proposed Pipelined Architectures 
Daub4 Daub6 
Area (Slices) 2,731 (13.3%) 2,968 (14.5%) 
Maximum Frequency (MHz) 36.55 33.00 
Power Consumption (mW) 142 158 





In terms of area, Daub4 architecture requires less area with 142 mW power consumption at 36.55 
MHz speed. In comparison with Daub6, it can be clearly seen that the Daub4 implementation requires less 
complicated mapping. This is due to the complex algorithms and edge problem that occurs with Daubechies 
wavelet transform. Moreover, the operation of multiplication in Daub6 implementation consumes more 
resources, hence 1.2% of more area are required for Daub6 and 16 mW of more power obtained with 33 
MHz clock frequency. 
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Eventhough Daubé6 architecture required more area, it have higher vanishing moments that result in 
better signal approximation. Thus, the uses of Daub6 wavelet filter in the proposed architecture yields better 
reconstruction quality for CT image. Figure 3 shows the implementation results for the proposed 
architectures using CT image. 





Figure 3. Implementation results with CT image 


5. CONCLUSION 

Two architectures for 3-D Daub4 and Daub6 have been proposed in this paper based on transpose 
computation for transform block of medical image compression. In addition, the 3-D medical imaging 
modalities, CT image is used as the input for the compression system. 

Comparative study for the proposed architectures has reveals that Daub4 wavelet filter provides 
better achievements in terms of implementation results. In the other side, Daub6 wavelet filter gives better 
performance to the reconstruction quality for CT image. On-going research is focusing on the design and 
FPGA implementation of 3-D DWT using various types of wavelet filters and different design strategies. 
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